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ABSTRACT 
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objects  in  a  sequence  of  images  are  developed.  The  algorithms  considered  consist  of:  the 
parallel  extended  Kalman  filter  method;  the  spatiotemporal  gradient  methods;  the 
spatiotemporal  frequency  methods;  and  the  one-dimensional  FFT  methods.  These  algorithms 
are  designed  to  perform  on  low  signal  to  noise  ratio  images.  Each  of  these  algorithms  is 
applied  to  a  sequence  of  computer  generated  images  with  varying  signal  to  noise  ratios. 
Simulations  are  used  to  evaluate  the  performance  of  each  algorithm. 
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I.   INTRODUCTION 

The  measurement  of  displacement  between  successive  frames  in  an  image  sequence 
gives  information  about  the  motion  (or  velocity)  of  the  objects  in  these  images.  The 
motion  information  is  an  important  element  in  the  source  coding,  restoration,  analysis 
and  interpretation  of  time-varying  imagery,  since  a  time-varying  scene  can  be 
characterized  to  a  large  extent  by  the  motion  it  contains.  The  characterization  may  lead 
to  detection  of  objects  (targets,  their  velocities  and  trajectories),  it  can  be  used  to 
distinguish  between  scene  and  noise  components  in  order  to  reduce  additive  noise  in 
sequences  and  furthermore  can  be  used  as  a  cue  for  image  segmentation.  Motion 
imformation  may  also  be  useful  for  the  compact  representation  of  the  image  sequences 
for  transmission  and  storage  purposes  (as  in  motion-compensated  interframe  coders).  The 
motion  patterns  in  the  image  sequence  may  be  due  to  the  motion  of  objects  in  the  scene, 
the  motion  of  the  observer  or  both.  Also,  the  motion  may  be  apparent  motion  where  a 
change  in  the  image  intensity  between  frames  gives  the  illusion  of  motion. 

There  are  4  basic  approaches  for  motion  estimation: 

(1)  feature  correspondence  (matching), 

(2)  difference  picture  methods, 

(3)  spatio-temporal  gradients, 

(4)  Fourier  phase  changes. 

Matching  methods  use  algorithms  for  matching  the  components  of  objects  between  two 


successive  frames  of  a  scene  containing  moving  objects.  This  leads  to  an  optimization 
process  which  can  easily  be  solved.  The  drawback  to  this  approach  is  that  identifiable 
features  must  be  derived  from  the  image  before  matching  can  be  tried.  Also  the  spatial 
range  over  which  image  flow  is  derived  is  short  in  comparison  to  the  size  of  the  region 
in  which  identifiable  features  exist.  Difference  picture  methods  start  with  a  pixel-wise 
difference  between  successive  frames  in  an  image  sequence.  The  differences  can  be 
thresholded  to  produce  a  binary  motion  image.  Since  difference  picture  methods  do  not 
take  the  direction  or  speed  of  the  motion  into  account,  the  methods  are  not  easily  applied 
when  the  background  is  moving  and  can  not  easily  differentiate  between  different  objects 
moving  with  different  velocities.  The  problem  with  difference  picture  methods  is  that 
they  are  local  and  do  not  combine  spatial  and  temporal  changes  over  neighborhoods  to 
improve  the  information  on  the  velocity  field.  Spatio-temporal  gradient  methods  relate 
the  spatial  and  temporal  gradient  at  each  pixel  in  the  image  plane  to  the  instantaneous 
velocity  at  that  pixel  point.  This  forms  a  two-dimensional  vector  field  called  the 
displacement  field  (velocity  field  or  optical  flow).  The  Fourier-phase  methods  utilize  the 
shift  property  of  Fourier  transform  and  generate  a  solution  based  on  the  spatial-frequency 
domain  data. 

In  this  thesis,  some  of  the  current  motion  estimation  algorithms  are  introduced, 
discussed,  and  improved.  A  comparison  between  algorithms  is  established.  Chapter  II 
includes  the  application  of  the  parallel  extended  Kalman  filter.  The  Extended  Kalman 
Filter  is  applied  to  sequential  images  containing  a  moving  object  to  estimate  object's  two- 
dimensional  velocity  components  and  to  increase  image  quality  by  the  reduction  of  noise 


for  low  signal  to  noise  images.  The  idea  of  using  an  extended  Kalman  filter  (EKF)  to 
enhance  the  quality  of  the  image  frame  and  provide  an  estimate  of  the  object  velocity  was 
proposed  by  Burl  [Ref.  1].  Chapter  III  includes  the  algorithms  that  utilize  the  Image 
Flow  Constraint  equation  which  relates  the  spatial  changes  in  an  image  frame  to  the 
temporal  changes  between  the  frames.  Several  ways  for  solving  this  equation  are 
discussed.  Chapter  IV  and  V  address  the  Fourier  transform  methods  of  motion 
estimation.  These  relate  phase  changes  in  the  Fourier  domain  representation  of  image 
frames  to  the  velocity  of  objects  contained  in  the  image  frames.  The  algorithms  discussed 
in  this  thesis  are  simulated  under  different  background  and  noise  conditions  to  evaluate 
their  performance.  Chapter  VI  includes  the  simulation  results  and  the  final  comments. 
In  this  thesis,  a  moving  image  model  is  defined  as  a  series  of  image  frames 
containing  a  moving  object.  An  image  frame  consists  of  a  two-dimensional  array  where 
the  individual  array  elements  are  defined  as  pixels.  Each  pixel  is  given  a  numerical  value 
based  on  the  intensity  of  the  image  at  that  point. 


H.    THE  EXTENDED  KALMAN  FILTER 

A.      INTRODUCTION 

An  extended  Kalman  filter  (EKF)  may  be  developed  for  the  estimation  of  both  the 
image  and  velocity  components  from  a  sequence  of  images  in  a  nearly  optimal  manner 
[Ref.  1].  First  the  EKF  requires  a  dynamic  model  that  describes  the  evolution  of  the 
sequential  image  frames.  A  simple  dynamic  model  consists  of  a  shift  operator  in  the 
spatial  domain  which  describes  the  motion  of  the  object.  Unfortunately  the  EKF 
developed  from  this  model  requires  a  prohibitive  number  of  computations.  This  difficulty 
is  overcome  by  transforming  both  image  and  EKF  into  the  spatial  frequency  domain.  The 
two-dimensional  discrete  Fourier  transform  is  used  to  transform  these  two-dimensional 
image  frames  from  the  spatial  domain  to  the  spatial  frequency  domain.  By  transforming 
the  image  frame  to  the  spatial  frequency  domain,  the  shift  operator  in  the  spatial  domain 
becomes  a  phase  shift  operator  in  the  spatial  frequency  domain.  The  shift  of  each  spatial 
frequency  is  given  by  the  scalar  product  of  the  velocity  and  the  frequency.  The  EKF  can 
be  applied  to  each  of  the  spatial  frequencies  separately.  The  number  of  calculations  can 
then  be  reduced  by  limiting  the  number  of  spatial  frequencies  that  are  analyzed. 


B.      THE  MOVING  IMAGE  MODEL 


1.       The  Moving  Image  Model  in  the  Spatial  Domain 

The  evolution  of  successive  image  frames  containing  a  moving  object  is 
modeled  by  a  nonlinear  state  equation  in  the  spatial  domain.  The  background  of  the 
image  is  assumed  to  be  zero  to  simplify  the  initial  presentation.  A  new  model  for  the 
images  with  background  will  be  presented  later.  Also  the  moving  object  is  assumed  to 
be  completely  contained  in  the  image.  So  the  state-space  equation  is  defined  as, 


x(k+l)  = 


(2.1) 


flk+i)       s(v)o  Ak) 

v(fc+l)J      [  0    l\  [v(k) 
=  fl(v)  *(*), 
where  x(k)  is  the  state  vector  of  the  system, /(&,)  E  Ft™1  is  the  NxN  pixel  image  at  time 

k  stacked  into  a  vector  of  the  amplitudes  of  each  pixel,  v  G  R2  is  the  velocity  vector 
(consists  of  a  two-element  column  vector  describing  the  horizontal  and  vertical  velocities 
of  the  moving  object  in  terms  of  pixels  per  sampling  time)  and  S(v)  is  a  two-dimensional 
shift  operator  with  magnitude  and  direction  given  by  v.  The  vectors /and  v  combine  to 
form  the  state  vector*  G  R"*"*2.  Note  that  object  motion  is  equivalent  to  a  shift  operator 
applied  to  the  entire  image  since  the  background  is  zero.  Also  the  shift  operator  is  a 
circular  shift  operator  since  the  object  is  assumed  to  remain  within  the  boundary. 

Equation  2. 1  describes  a  sequence  of  identical  replications  of  an  object  as  it 
moves  accross  an  image  frame  with  constant  velocity.  A  more  realistic  model  would  be 
to  include  changes  in  the  object  due  to  rotations,  change  in  the  aspect  angles  etc. 
Additionally,  changes  in  the  object's  velocity  may  occur.  A  new  model  which  includes 


these  changes  may  be  developed  as, 


x(k+\)  = 


x(k) 


CM 


(2.2) 


5(v)  0 
0    / 

=  a(y)  x(k)  ♦  C(*), 
where  the  plant  noise  is  composed  of  the  image  plant  noise,  £  and  the  velocity  plant 
noise,  £,.  The  plant  noise  is  assumed  to  be  a  zero-mean,  white,  Gaussian  random  noise 
with  a  covariance  matrix  defined  as, 


Qrr   =E 


■cc 


[c;  q 

= 

aj  I    0 
0    a2vI 

(2.3) 


where  E[»]  is  the  expectation  operator  and  /  is  the  identity  matrix.  The  measurement  of 
the  image,  y(k),  is  defined  by  the  corresponding  measurement  equation, 


y(k)  =f(k)+n(k)  =  [/  0]x(k)+n(Q  =  C  x(k)+n(k), 


(2.4) 


where  v  E  RNxN  is  the  measured  image  and  n(k)  E  R**  is  the  measurement  noise  which 
is  assumed  to  be  a  zero-mean,  white,  Gaussian  random  process  with  a  known  covariance 


matrix, 


*„  =  E  [n(k)  n  T(k)}  =  oj  /. 


(2.5) 


2.       Derivation  of  the  Shift  Operator  in  the  Spatial  Frequency  Domain 

The  two-dimensional  image  frame  defined  as/fa.nj  where   0  <  n,  <  N-l 
and    0  <  n2  <  M7  is  transformed  from  the  spatial  domain  to  the  spatial  frequency 


domain  by  using  the  two-dimensional  discrete  Fourier  transform  (DFT), 


AM      AM  -An^kj 


nw  =  E  E^»,«2) ^  *        .  (26) 

nt=0     «2=0 


where   0  <  k,  <  N-l   and   0  <  k2  <  N-l. 

The  shift  operator  in  the  spatial  frequency  domain  is  derived  from  the  circular 
shift  property  of  the  discrete  Fourier  transform  [Ref.  2:p.  67].  The  circular  shift 
property  is  defined  as, 

*(  ((ri1-ml))Nt((n2-m2))N  )  ~  <*1  WJ*  X(kvk2), 

~  D(mltm2)  X(^,y, 

where  ((•))  is  the  modulus  operator,  WN  =  e'J2r/N  ,and  Dfm;,/nJ  is  the  transformed  shift 
operator.  Equation  2.7  indicates  that  a  circular  shift  in  the  spatial  domain  results  in  a 
phase  shift  in  the  spatial  frequency  domain.  The  original  image  frame,  xfn^nj,  is 
assumed  to  be  a  periodic  two-dimensional  sequence  with  the  fundamental  period  equal 
to  the  frame  dimensions.  The  object  moving  across  the  image  frame  then  describes  a 
circular  shift  [Ref.  2:pp.  61-62],  This  implies  that,  as  the  object  moves  off  one  edge  of 
the  screen,  it  reenters  the  screen  from  the  opposite  side.  Real  image  frames,  such  as  a 
radar  screen  or  video  display,  do  not  require  this  property  because  the  moving  object's 
size  will  likely  be  much  smaller  than  the  screen  dimensions. 

The  derivation  of  D(m,,mJ  begins  by  introducing  a  circular  shift  into  the 
original  image  frame,  f(nlyn^.  Equation  2.6  now  becomes, 


AM      AM  .n%(!th.*^b.) 

F'(W  =  E    E  An.-m^-m^  e         «     »  ,  (2-8) 

where  F'fk^kJ  is  the  circular- shifted,  transformed  image  frame.  The  amount  of  pixel 
movement  is  described  by  m,  for  the  horizontal  shift  and  m2  for  the  vertical  shift.  Letting 
l,=nrm,  and  l2=n2-m2  Equation  2.8  becomes, 

f'(*ijy=  E     E  AW"'"1    *        "    '•  (2'9) 

Rearranging  terms,  Equation  2.9  reduces  to, 


(2.10) 


-;2n(-li+-^) 

=  D(mvm2)  Fik^kJ, 
where  F(kukJ  is  the  transformed  image  frame  without  the  circular  shift.  This  proves  that, 
when  an  object  moves  across  an  image  frame  in  the  spatial  domain,  the  motion  is 
described  by  a  phase  shift  in  the  spatial  frequency   domain,  thereby  agreeing  with  the 
circular  shift  property  (Equation  2.8). 

3.       The  Moving  Image  Model  In  The  Spatial  Frequency  Domain 

The  spatial  domain  model  described  by  Equation  2.3  can  be  transformed  to 
the  spatial  frequency  domain.  The  transformation  is  accomplished  by  taking  the  discrete 
Fourier  transform  of  the  image  portion  of  these  equations  which  gives, 


F(*+l) 
v(Jt+l) 

= 

D(y)   0 
0      / 

F(k) 

+ 

z^k) 
Cv(*) 

X(fc+1)  =  A(v)  X(k)  +  Z(it) 

= 

«(X(*),Z(*)) 

(2.11) 


where, 


F(k)  =  ^{/(it)}  e  C*\ 


(2.12) 


z/*)  =  ^K/*)}  e  C"2, 


(2.13) 


Z>(v)  =  diagonal  [  J**"  ]. 


(2.14) 


and  a(*,*)  is  a  non-linear  operator.  The  vector,  f(k)  and  its  corresponding  plant  noise, 
X/k)  are  transformed  to  the  spatial  frequency  domain,  where  &{•}  is  the  two-dimensional 
discrete  Fourier  transform  operator  (Note  that  the  index  k  refers  to  the  sample  at  time 
k).  The  transformed  shift  operator,  D(v)  is  a  diagonal  matrix  where  the  diagonal  terms 
D{(v)  are  the  transformed  shift  operators  for  specified  spatial  frequencies  so  that  the 
corresponding  elements  w,  of  spatial  frequency  vector  w  is  defined  as, 


°>*  = 


2-nkj 


N 


N 


(2-15) 


The  image  plant  noise  remains  a  zero-mean,  white  Gaussian  random  process  since  it  is 
generated  by  a  linear  operation: 


Q„  =  Elzz7]  = 


all    0 

0     o2J 


(2.16) 


Additionally,  note  that  only  half  of  the  spatial  frequencies  need  to  be  stored  in  F(k)  due 
to  the  conjugate  symmetry  property  of  discrete  Fourier  transform  of  a  real  image  [Ref. 
3:p.  45]. 

The  corresponding  measurement  equation  in  the  spatial  frequency  domain  is: 

Y(k)  =  JFX*)  +  W), 

=  [10]  X(k)  +  N(k),  (2-17) 

=  C  X(k)  +  N(k), 

where  Y(k)  =  £y(k))  and  N(k)  =  ^v(k)),  and  Wis  a  zero-mean  gaussian  with  known 

covariance  matrix, 

Rm  =  E[N(k)  NT(k)]  =  a2HL  (2-18> 

In  practice,  the  image  is  measured  and  then  the  Fourier  transform  is  performed.  But  for 
simplicity,  we  will  assume  that  the  Fourier  transform  of  the  image  is  measured.  The  state 
equations  given  by  Equations  2.11  through  2.18  are  the  basis  for  implementing  the  EKF 
in  the  spatial  frequency  domain.  The  Equations  2.11  through  2.18  define  a  nonlinear  state 
space  model  of  a  sequence  containing  a  moving  object.  The  state  of  this  system  consist 
of  both  the  image  (in  the  spatial  frequency  domain)  and  the  velocity  of  the  moving 
object.  The  model  is  driven  by  white  noise.  Measurements  consist  of  the  image  in  the 
spatial  frequency  domain  corrupted  by  additive  noise.  The  extended  Kalman  filter  has 
found  wide  application  as  a  state  estimator  for  systems  of  the  form  described  above. 
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C.      THE  EXTENDED  KALMAN  FILTER 

1.       The  Extended  Kalman  Filter 

The  extended  Kalman  filter  is  a  nonlinear,  recursive  filter  that  can  be 
employed  to  generate  estimates  of  a  system  whose  evolution  is  governed  by  a  nonlinear 
state  equation.  The  extended  Kalman  filter  equations  for  the  moving  image  model  are: 
State  Prediction: 

X(k+l\k)  =  a(X(k\k),0),  (2-19) 

Y(k+l\k)  =  CX(k+l\k),  (2-2°) 

Covariance  Prediction: 

P(k+l\k)  =  A(X(k\k))P(k\k)AT(X(k\k))  +  Qu,  (2-21) 

Kalman  Gain: 

G(*+l)  =  P(k+l\k)CT[CP(k+l\k)CT  +  Rmy\  (2.22) 

State  Correction: 

X(£+l  |A:+1)  =X(k+l  \k)  +  G(k+l)  [Y(k+l)  -  Y(k+l  \k)],  (2-23) 

Covariance  Correction: 

P(k+l\k+l)  =  [I-G(k+l)C]P(k+l\k).  (2-24) 

The  A  denotes  an  estimate  and  the  notation  k+1  \k  is  read  as:  at  time  k+1  given  data 
through  time  k.  These  equations  are  a  set  of  recursive  equations  that  can  be  used  for 
estimating  the  state  X(k). 
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2.       The  Modified  Extended  Kalman  Filter 

As  mentioned  earlier,  the  diagonal  properties  of  the  transformed  shift 
operator,  D(v),  and  the  covariance  matrices  are  the  basis  for  the  parallel  structure  shown 
in  Figure  2.1.  This  parallel  structure  is  referred  to  as  the  modified  extended  Kalman 
filter.  Reference  1  outlines  how  the  EKF  converges  to  the  MEKF  as  the  velocity  estimate 
approaches  the  actual  velocity.  Practical  implementation  of  the  MEKF  suggests  that  each 
filter  in  the  parallel  bank  be  dedicated  to  a  specific  spatial  frequency.  Each  individual 
filter  is  referred  to  as  a  single  frequency  extended  Kalman  filter  (SFEKF).  The  state 
vector  for  a  specific  spatial  frequency  is  defined  as, 


Xfk)  = 


\m 

ReiFtf)) 

Xi2(k) 

= 

/m(F.(*)) 

Xu(k) 

T 
Ci),V 

(2.25) 


where  the  first  two  states,  X,(k)  and  X2(k)  are  the  real  and  imaginary  parts  of  Fourier 
coefficients  associated  with  a  specific  spatial  frequency  and  the  third  state,  Xs(k),  is  the 
product  of  the  spatial  frequency  vector  and  the  velocity  vector.  Xu(k)  is  called  the 
velocity-frequency  product  for  each  frequency,  Equation  2.11  becomes, 


Im(FXk+l)) 


T 
CO,  V 


T  '          T 

cos(co,v)  sin(cOjV)    0 

T  T 

-sin(  co ,  v)  cos(  co ,  v)   0 

0  0          1 


Re(Fflt)) 

ImCF^k)) 


T 

co,v 


(2.26) 


where  v  contains  the  two  velocity  components  as, 
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V    = 


(2.27) 


The  non-linear  state  matrix  of  Equation  2.26  can  be  linearized  around  the  current 
estimate  of  the  state  by  using  the  Jacobian  operator, 


A(X(k\k))  = 


da 
dX 


(2.28) 


X=X(k\k) 

The  linearized  state  equation  for  a  single  frequency  is, 


X.(*+l)  =  Aftfm  *fto  +  Zffi, 


(2.29) 


where, 


cos(4(fc))    sin(4(Q)    [-4(^)sin(4(^))+4(^)cos(4(^))] 
-sin(4(fc))   cos(4(it))    [-4(it)cos(4(fc))-4(*:)sin(4(A))] 


0 


0 


1 


(2.30) 


The  single  frequency  model  (Equation  2.29),  the  measurement  equation  (Equation  2.20), 
the  linearized  state  matrix  (Equation  2.30)  combine  with  Kalman  filter  equations  to  fully 
specify  the  single  frequency  EKF's.  Application  of  these  equations  yields  estimates  of  the 
real  and  imaginary  components  of  the  Fourier  coefficients  and  the  frequency-velocity 
products. 
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D.      VELOCITY  ESTIMATION 

The  single  frequency  EKF's  each  yield  an  estimated  frequency-velocity  product. 
These  estimates  can  be  combined  to  yield  an  estimate  of  the  velocity  of  the  moving 
object.  This  final  estimate  is  computed  by  using  a  weighted  least-squares  algorithm. 
Unfortunately  this  estimation  algorithm  is  complicated  by  an  ambiguity  of  multiplies  of 
2-7T  in  the  frequency-velocity  product  estimates.  This  ambiguity  is  distinct  from  the 
problem  of  phase  unwrapping  since  the  single  frequency  EKF's  are  free  to  converge  to 
any  of  the  possible  multiplies  and  do  not  have  the  rigid  structure  to  the  phase  unwrapped 
signal  [Ref.  1].  By  including  these  ambiguity  terms  the  frequency-velocity  estimates  can 
be  modelled  by, 


XJk\k)  =  Qv  +  w+27iAn,  (2.31) 


where, 


N 


w  =  [wltw2,...,wNi],  (2.33) 


and 


m  =  [ml,m2,...,mNi].  (2.34) 

The  term  w  is  a  zero-mean,  Gaussian  random  vector  with  the  estimated  covariance, 

E  =  E[wwT]  =  diag[PmJ>2Si,...J>lto3l  (2.35) 

where  PU}  is  the  3,3  component  of  the  z'th  estimation  error  covariance  matrix  computed 
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by  each  of  the  SFEKF's.  As  stated  before,  the  frequency-velocity  product  enters  the  state 
equation  only  as  a  phase  term  and  is  ambiguous  with  respect  to  a  shift  of  a  multiple  of 
2x.  The  term  m  is  an  unknown  integer  vector  which  models  this  ambiguity. 

The  estimates  from  Equation  2.31  can  be  used  to  estimate  the  velocity  by  finding 
the  real  vector  v  and  integer  vector  m,  that  minimizes  the  weighted  sum  of  the  squared 
errors, 

J=(X3(k\k)-Qv-2nm)T  E-l(X3(k\k)-Qv-2nm).  (2-36) 

The  vectors  v  and  m  that  minimize  Equation  2.36  can  be  found  by  performing  a  search 
over  all  possible  values  of  m.  For  each  value  of  m,  finding  the  value  of  v  that  minimizes 
Equation  2.36  is  equivalent  to  the  classical  Weighted  Least  Squares  problem.  A  new  data 
vector  may  be  defined  as, 

X  =  X3  (k\k)  -  2izm,  (2-37) 

and  the  sum  of  the  squared  error  is, 

J  =  (X-Qv^E^CX-Gv).  <2-38) 

Constraining  the  velocity  estimate  to  be  a  linear  function  of  the  data  yields, 

v  =  GVX3(*|*),  (2-39) 

where, 

Gv  =  (Q'E^Q)-1  QrS-1.  (2-40) 

Substituting  Equation  2.39  into  Equation  2.36  yields  the  sum  of  the  error  as  a  function 


15 


of  m, 

J(m)  =  (XikW-lnmfX-^I-QGJiX^klQ-lnm).  (2-41) 

Using  Equation  2.41,  m  can  be  estimated  using  a  search  of  all  possible  vectors  m  to  find 
the  one  that  minimizes  J(m). 

A  simplification  to  this  search  for  the  optimal  vector  m  is  developed  by  Burl  [Ref. 
l:p.  12].  The  values  of  the  m,'s  will  be  zero  for  low  spatial  frequencies.  So  by  using  this 
subset  of  frequency  components,  the  linear  estimator  of  Equation  2.39  yields  v^.  This 
velocity  estimate  is  used  to  estimate  the  m,'s  that  do  not  meet  the  condition  m=0  by, 

mt  =  Round(±[Xi3(k\k)-uJvLJ),  (2.42) 

where  Round(*)  equals  the  integer  nearest  to  •.  This  estimate,  in  turn,  is  used  to  estimate 
the  velocity  using  Equation  2.37  and  2.39.  The  estimated  values  of  m  are  included  in  the 
frequency-velocity  product  estimates  that  are  fed  back  to  single  frequency  EKF's. 
Therefore,  once  the  filter  has  converged  the  values  of  m  will  all  be  zero. 

E.      IMAGES  WITH  NONZERO  BACKGROUND 

The  background  in  most  image  sequences  will  be  non-zero.  A  method  of  modifying 
the  MEKF  for  use  in  this  case  is  proposed  by  Burl  [Ref.  1].  This  method  proposes  that 
2  more  states  are  added  to  the  single  frequency  EKF's.  The  additional  states  are  defined 
as  the  real  and  imaginary  portions  of  the  background  in  the  spatial  frequency  domain 
and, 
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x/®  = 


Xtl(k) 

MFfk)) 

Xi2(k) 

Im(Ffk)) 

Xa(k) 

= 

*4FM 

Xi4(k) 

ImiFjm 

X^k) 

a^vik) 

(2.43) 


where  F^fk)  is  the  Fourier  transform  of  the  stationary  background.  The  state  transition 
matrix  for  these  new  states  are  simply  the  identity  matrix  since  the  background  is 
considered  to  be  constant.  The  state  matrix  of  Equation  2.30  then  becomes, 

A,(X,(*))  = 
cos(4(*))    sin(X.5(A:))    0  0    [-4(*)sin(4(*))+4(fc)cos(4(*))] 

-sin(4(*))  cos(4(*))  0  0  [-4(fc)cos(4(^))-4(*)sin(4(AD)]     (2.44) 

0                  0  10                              0 

0                  0  0   1                              0 

0                  0  0  0                              1 


In  this  case  the  measurement  is  modeled  as  the  sum  of  the  moving  object  state,  the 
stationary  background  state,  and  the  additive  noise. 
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Figure  2.1  The  Modified  Extended  Kalman  Filter. 
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m.    SPATIO  -  TEMPORAL  GRADIENT  APPROACH 

A.      THE  IMAGE  FLOW  CONSTRAINT  EQUATION 

1.       The  Image  Flow  Constraint  Equation 

This  method  for  computing  the  velocity  field  (optical  flow)  employs  the  first- 
order  spatial  and  temporal  differentials  of  a  time  varying  image  to  estimate  the 
component  of  motion  at  each  point.  We  can  derive  an  algorithm  that  relates  the  change 
in  image  brightness  at  a  point  to  the  motion  of  the  brightness  pattern.  Let  the  image 
brightness  at  a  point  (x,y)  in  the  image  plane  at  time  t  be  denoted  by  E(x,y,t).  Now 
consider  that  the  pattern  moves.  The  brightness  of  a  particular  point  in  the  pattern  is 
constant,  so  that, 

—  =  0.  (3.1) 

dt 

Using  the  chain  rule  for  differentiation  we  see  that, 

dEdx   +   dEdy  +   BE  =  Q  (3  2) 

dx  dt        dy  dt        dt 

(A  more  detailed  derivation  with  Taylor  series  expansion  is  included  in  Appendix  A). 
This  equation  can  simply  be  written  as, 

Ex  u  +  Ey  v  +  Et  =  0,  (3.3) 

where, 
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u  = 


dx 
dt 


dy 
dt 


(3.4) 


E    -  —       E    -  —        E    -    dE 


dx  y        dy  '        dt 

The  partial  derivatives  Ex,  Ey,  E,  are  estimated  from  the  image  intensity  values  and  will 
be  formulated  later.  The  constraint  given  by  Equation  3.2  on  the  local  flow  velocity  is 
illustrated  in  Figure  1  [Ref.  3:pp.  20-46]. 


Figure  3.1.  The  locus  of  points  satisfied  by  the  image  flow  constraint  equation  defines 
a  line  in  the  velocity  space. 
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This  constraint  is  often  called  as  the  image  flow  constraint  equation  since  it  expresses  a 
constraint  on  the  components  u  and  v  of  the  image  flow.  As  seen  in  Figure  3.1,  values 
of  (u,v)  satisfying  the  constraint  equation  lie  on  a  straight  line  in  velocity  space.  Two 
conditions  are  required  to  ensure  the  validity  of  the  image  flow  constraint  equation;  (1) 
the  change  in  image  irradiance  at  each  point  must  be  entirely  due  to  the  motion  of  the 
image  pattern,  as  opposed  to  changes  in  the  pattern  due  to  reflectance  effects,  and  (2)  the 
image  should  be  smooth  except  at  a  finite  number  of  discontinuities  [Ref.  4:pp.  185-189]. 
Surface  based  smoothing  may  be  used  to  improve  the  velocity  estimates  for  images  with 
discontinuities.  These  conditions  are  probably  too  restrictive  to  allow  image  flow  to  be 
useful,  but  the  image  flow  equation  is  obeyed  in  practice  with  sufficient  accuracy  for 
tasks  such  as  segmentation. 

The  image  flow  can  not  be  computed  at  a  point  in  the  image  independently 
of  neighboring  points  without  introducing  additional  constraints.  This  limitation  occurs 
because  the  velocity  field  at  each  point  has  two  components  while  the  change  in  image 
irradiance  at  a  point  due  to  motion  yields  only  one  constraint.  Estimation  approaches  for 
this  problem,  in  one  manner  or  another  introduce  additional  constraints  that  allow 
mathematical  solutions.  One  solution  to  this  dilemma  is  to  assume  that  in  the  local  image 
region  there  are  two  values  of  E(x,y,t)  that  have  the  same  image  motion,  i.e.  adjacent 
pixels  exhibit  similar  image  motion.  Defining  these  two  points  as  E,  =  E(x,,y,,t)  and 
E2  =  E(x2,y2tt),  we  can  write  the  set  of  two  equations, 
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dE.  BE.  BE. 

— lu  +  — Lv  +  — I  =  0, 
dx  dy  dt 

(3.5) 

oE~  oE^  BE. 

— 2-u  +  — 2-v  +  — ?•  =  0, 
dx  dy  dt 


or  the  matrix  equation, 


£  =  «„«.  (3-6) 

where  u  =  fu  y/T,  and  the  matrix  5"  is  formed  with  the  appropriate  values  from  the 
equations  above.  Inverting  S  yields  fu  vf  [Ref.  5:pp.  230-238].  However,  any  errors  in 
the  spatial  and  temporal  derivative  estimates  may  have  severe  effect  on  the  velocity 
estimates.  Furthermore  there  is  no  guarantee  that  S  is  invertible.  The  local  motion 
homogeneity,  although  perhaps  valid  for  image  points  corresponding  to  object  points  that 
are  close  together  on  the  moving  object,  is  not  a  valid  assumption  when  adjacent  image 
points  correspond  to  the  motion  of  independently  moving  and  occluding  objects. 

An  alternative  approach  is  to  explicitly  formulate  a  local  smoothness 
constraint  on  the  velocity  vector,  that  is  to  assume  that  objects  of  small  size  are 
undergoing  rigid  motion  and  deformation.  In  this  case  neighboring  points  on  the  objects 
have  similar  velocities  and  the  velocity  field  of  the  brightness  patterns  in  the  image  varies 
smoothly  almost  everywhere.  Discontinuities  in  flow  can  be  expected  at  the  object 
boundaries  and  also  where  one  object  occludes  another.  This  may  be  formulated 
mathematically  as  the  additional  constraint  which  is  to  minimize  the  sum  of  the  squares 
of  the  Laplacians  of  the  x  and  v  components  of  the  flow.  The  Laplacians  of  u  and  v  are 
defined  as, 
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V*w  =  ^+^      and     *v-**J*l.  (3.7) 

&t2    dy2  dx2    dy2 

Now  we  can  introduce  the  optimization  problem.  The  optimization  measure  consist  of 

two  terms:  a  penalty  on  the  deviation  of  estimated  velocity  field  from  the  image  flow 

constraint  equation  and  a  penalty  on  the  departure  of  velocity  components  from 

smoothness  (the  smoothness  penalty  was  the  sum  of  the  squares  of  the  magnitude  of  the 

gradient  of  the  image  flow  velocity).  These  constraints  may  be  written  as, 

$u    #u    d*v    ch 
Minimize,  e,  =  + + + 

dx2    dy2    dx2    dy2  (3.8) 

subject  to,  z2  =  Ex  u  +  E  v  +  Et  =  0  . 

The  total  error  to  be  minimized  becomes, 

er  =  ej  +  X  e2.  (3-9) 

2.       Estimating  the  Partial  Derivatives  and  the  Laplacian  of  Flow  Velocities 

We  must  estimate  the  derivatives  of  the  brightness  from  the  discrete  set  of 
image  brightness  measurements  available.  The  consistency  of  partial  derivatives  are  very 
important.  That  is  they  should  refer  to  the  same  point  in  the  image  at  the  same  time. 
Although  there  are  many  formulas  for  approximate  differentiation,  an  optimum  method 
given  by  Schunk  [Ref.  4:p.  190]  can  be  used,  which  gives  an  estimate  of  Ex,Ey,E,  at  a 
point  in  the  center  of  a  cube  formed  by  eight  measurements.  The  relationship  in  space 
and  time  between  these  measurements  is  shown  in  Figure  3.2.  Each  of  these  estimates 
is  the  average  of  four  first  differences  taken  over  adjacent  measurements  in  the  cube. 
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Figure  3.2.  The  three  partial  derivatives  of  image  brightness  are  estimated  from  the 
average  of  first  differences  along  four  parallel  edges  of  the  cube. 


(3.10) 


Ex    ~    ^EiJ*lJc    EtJJc+Ei*lJ*lJt    Ei*lJJt 

+Eij*lJc*l       ijJc*l       i*lj+lje*l      i*ljJt*l*' 

Ey   ~  ^Ei*lJJt~EiJJc+Ei*iJ*lJt~EU*U 

+Ei*ljJc*l  ~EijJc+l  +Ei*lj+lJt+l  ~Eij+lMl ' ' 

Et  ~  ^EiJJt*l~EiJJc+EiJ*lJt*l~EiJ*lJt 

*Ei+ljJc*l~Ei*lJJc+Ei+lJ+lJc*l~Ei*lj*lJc*m 

We  also  need  to  approximate  Laplacians  of  u  and  v.   One  convenient 
approximation  takes  the  following  form, 
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v*«  «  (u  -  u)    and    V*v  *  (v  -  v), 


(3.11) 


where  the  local  averages  u  and  v  are  defined  as  follows, 


if  i 

viVJfc  ""   6  iv<+vjt+vi-ijyk+vv+u+vv-u' 

J_,  . 

+  12  V,+1^+1'Jt+Vl+1^-1'*+V,'-1^+1'jk+Vi-1^-1^^ 


(3.12) 


Figure  3.3    The  Laplacian  is  estimated  from  a  weighted  average  of  the  values  at 
neighboring  points. 
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3.       Minimization 

The  minimization  of  total  error  eris  a  constrained  minimization  process  which 
has  been  formulated  by  Lagrange  multipliers  in  Equation  3.9  (there  are  several  other 
ways).  The  solution  to  the  Lagrange  formulation  is  given  by  finding  values  of  u  =  [u,vf 
that  satisfy, 

"7T  =0' 

(3.13) 

dcT 
—  =  0. 

dX 

Using  the  calculus  of  variations  [Ref.  6:pp.  107-178]  yields  the  solution  as, 

E\  u  +  Ex  Ey  v  =  X  Vu  -  Ex  Et,  (3  U) 

Ex  Ey  u  *  Ey  v  =  X  Vv  -  £y  £,. 

Using  the  approximation  to  the  Laplacian  introduced  previously  gives  the  folowing 
equations, 


(A  +  El)u  +  Ex  Ey  v  =  (X  u  -  Ex  E),  ~  ^ 

(Ex  Eu  +  (X  ♦  £>  =  (X  v  -  E  E). 


Solving  for  u  and  v  gives, 


u  =  u  -  (El  u  +  Ex  E   v  +  Ex  E)/(X  +  E\  +  Ey) 


-y    "    T   ^x  ~|*V»    T  *-*    T    *V  (3.16) 


v  =  v  -  (Ex  Ey  u  +  £;  v  +  £y  Et)/(X  ♦  £x'  +  £p 


These  results  yield  two  solutions  for  X  such  that, 
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k  =  -\E\  u+Ex  Ey  v+Ex  Et]/(u  -  u),  (3  n) 

=  -[Ex  Ey  u+E2y  v+Ey  Et]/(v  -  v). 

Notice  that  these  equations  constrain  u  and  v  but  return  no  new  information.  They  also 
yield, 

e2  =  0. 

Now  we  have  a  set  of  two  equations  for  each  point  in  the  image.  It  would  be 
very  costly  to  solve  these  equations  simultaneously  (also  with  the  Lagrange  multipliers). 
Instead  an  iterative  method  can  be  developed.  We  can  compute  a  new  set  of  velocity 
estimates  (if+1,V+1)  from  the  estimated  derivatives  and  the  average  of  previous  velocity 
estimates  (it,Y)  by, 


uB+1  =  u n  -  Ex[Ex  un  +Ey  v"  +  Et]/(X  +  Ex  +  Ey), 

v»*l    =    - n    _   £  r£     ~n    +   E     - n    +   £y^    +   £2    +   £2} 

y     *  y  '  *  y 


(3.18) 


(3.19) 


The  algorithm  stops  when  convergence  is  obtained:  that  is  when, 

u\x,y,t)  -  un-\x,y,t)    <.    ev 
vrt(A:,y,r)  -  v"\x,y,t)    <.    er 

In  parts  of  the  image  where  the  brightness  gradient  is  zero,  the  velocity 
estimates  will  simply  be  the  averages  of  the  neighboring  velocity  estimates.  There  is  no 
local  information  to  constraint  the  apparent  velocity  of  motion  of  the  brightness  pattern 
in  these  areas.  Eventually  the  values  in  these  areas  will  propagate  inward.  If  the 
velocities  on  the  border  of  a  region  are  all  equal  to  the  same  value,  then  points  in  the 
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region  will  be  assigned  that  value  too  (after  a  sufficient  number  of  iterations).  If  the 
values  on  the  border  are  not  the  same,  it  is  a  little  more  difficult  to  predict  what  will 
happen.  In  all  cases  the  values  filled  in  will  correspond  to  the  solution  of  the  Laplace 
equation  for  a  given  boundary  condition. 

The  best  result  in  most  cases  is  provided  if  the  algorithm  is  performed  a  small 
number  of  times  per  frame  of  a  test  sequence.  In  the  first  few  frames  an  image  flow 
velocity  will  be  estimated  within  the  boundaries  of  moving  object.  The  velocity  field 
outside  the  object  area  will  be  zero  (or  very  small).  As  the  object  moves,  the  region  of 
nonzero  velocity  field  does  not  move  along  with  the  moving  object,  since  there  is  nothing 
in  the  algorithm  that  forces  the  velocity  field  to  be  extrapolated  in  the  direction  of 
motion.  Since  the  location  of  the  region  of  nonzero  velocity  field  is  not  forced  to  move 
along  with  the  object,  the  translating  object  moves  out  from  under  the  region  of  nonzero 
velocity  vectors.  New  nonzero  velocity  vectors  are  formed  at  the  new  position  of  the 
object.  The  algorithm  does  not  eliminate  the  old  region  of  velocity  that  trails  behind  the 
moving  object.  This  creates  a  'comet  tail'  behind  the  object.  In  regions  where  the 
gradient  is  zero,  the  algorithm  degenerates  into  the  Laplace  equation  which  smooths  the 
velocity  field.  The  region  of  nonzero  velocity  vectors  that  trails  behind  the  moving  object 
is  also  smoothed  into  a  consistent  (though  obviously  wrong)  velocity  field  which  persist 
long  after  the  object  moves  beyond  the  boundaries  of  the  image.  As  mentioned  before, 
the  algorithm  cannot  work  in  cases  where  there  are  motion  boundaries  in  the  velocity 
field,  or  more  than  one  moving  object  exist  int  he  scene,  since  the  smoothness  constraint 
leads  to  iterative  equations  that  blur  abrupt  changes  in  the  velocity  field. 
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One  may  use  a  smoothing  algorithm  for  improving  the  velocity  field 
estimation.  The  smoothing  algorithm  constructs  a  smooth  estimate  of  the  velocity  field 
by  approximating  a  surface  between  step  discontinuities.  Another  algorithm  called 
Constraint  Line  Clustering  that  utilizes  the  polar  form  of  the  image  flow  constraint 
equation  to  estimate  the  image  flow  velocity  field  when  there  are  discontinuities  is 
described  by  Schunk  [Ref.  7:pp.  1010-1027]. 

B.      CONSTRAINT  LINE  CLUSTERING 

1.       The  Polar  Form  of  the  Constraint  Equation 

The  image  flow  constraint  equation  defines  a  line  in  velocity  space  as  shown 
in  Figure  3.1.  The  line  in  velocity  space  that  is  the  locus  of  possible  velocities  specified 
by  the  image  flow  constraint  equation  is  called  the  constraint  line.  The  constraint  line 
may  be  uniquely  defined  by  d,  the  distance  of  the  line  from  the  origin  along  the 
perpendicular  bisector  and  the  angle  a  with  respect  to  the  u  axis.  The  displacement  and 
the  angle  of  the  constraint  line  are  given  by, 

\£t\ 
d  =  (3.20) 

J  arctanCE^Ep       if  Et  *  0;  ^2\) 

1  arctan(-£'Jc,-£J    otherwise. 

To  derive  the  constraint  equation  in  polar  coordinates,  we  first  need  to  note  that  the 

image  flow  equation  contains  the  dot  product  of  the  gradient  of  the  image  irradiance  with 
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the  velocity  vector, 

Ex  u  +  Ey  v  +  Et  =  VE  .  (u,  v)  +  Et.  (3.22) 

If  we  define  p  as  the  speed  of  motion  and  /3  as  the  direction  of  motion  contained  in  the 
constraint  equation,  then  the  dot  product  in  Equation  3.22  becomes, 

p   |VE|  cos(  a  -  p  ).  (3-23) 

Dividing  through  by  the  magnitude  of  the  gradient  yields  the  polar  form  of  the  image 
flow  constraint  equation, 

d  =  p  cos(  a  -  p  ).  (3.24) 

Note  that  j3  is  constrained  to  be  between  a-x/2  and  a+x/2.  Since  the  displacement  d  of 
the  constraint  line  from  the  origin  must  be  nonnegative,  the  orientation  a  is  reflected 
when  E,  >  0.  The  displacement  d  is  the  projection  of  the  motion  vector  onto  the  line  of 
the  gradient  of  image  irradiance  and  is  independent  of  the  magnitude  or  polarity  of  the 
gradient. 

The  polar  form  of  the  image  flow  constraint  equation  can  be  used  in  the 
analysis  of  algorithms  for  image  flow  estimation  with  motion  boundaries  since  the  polar 
form  will  not  contain  5-functions  at  step  discontinuities.  The  polar  form  will  not  contain 
5-functions  since  the  displacement  d  is  the  ratio  of  the  magnitude  of  the  change  in  time 
of  the  image  irradiance  to  the  magnitude  of  the  gradient  of  image  irradiance  and  this  ratio 
remains  finite  in  the  limit  as  the  image  irradiance  becomes  an  ideal  step  discontinuity. 

The  constraints  are  sufficient  to  allow  the  sampled  image  sequence  to 
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accurately  portray  the  local  motion  data  between  motion  boundaries,  but  are  not  sufficient 
to  capture  the  motion  information  at  motion  boundaries.  The  motion  constraints  are 
extremely  sensitive  to  the  effects  of  occlusion  boundaries  and  are  usually  in  severe  error. 
No  practical  sampling  period  can  provide  useful  motion  information  along  a  motion 
boundary.  An  image  flow  estimation  algorithm  must  be  able  to  work  in  situations  where 
there  are  motion  boundaries  in  the  image.  So  the  algorithm  called  constraint  line 
clustering  [Ref.  7:pp.  1010-1027]  that  estimates  the  image  flow  field  when  the  image 
irradiance  pattern  or  the  velocity  field  contains  discontinuities  is  presented  in  the  next 
section. 

2.       Constraint  Line  Clustering 

The  constraint  line  clustering  algorithm  uses  the  polar  form  of  the  image  flow 
constraint  equation  given  by, 

d  =  p  cos(  a  -  p  ),  (3-24) 

where  p(x,y)  and  fi(x,y)  are  the  speed  and  direction  of  motion,  respectively.  The  velocity 
vector  (p,0)  for  the  motion  at  any  point  in  the  image  must  lie  along  the  line  in  velocity 
space  defined  by  the  image  flow  constraint  equation  shown  in  Figure  3.1.  The  constraint 
line  is  uniquely  defined  by  the  displacement  d  of  the  constraint  line  from  the  origin  and 
the  orientation  a  of  the  constraint  line.  The  constraint  line  displacement  has  the 
dimensions  of  speed;  it  is  the  minimum  speed  consistent  with  the  motion  constraint. 

Assume  that  the  image  intensity  E(x,y,t)  already  incorporates  any  spatial  or 
temporal  filtering  performed  on  the  image.  For  example,  the  image  could  be  smoothed 
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Figure  3.4   The  position  of  the  intersection  of  two  constraint  lines  along  one  of  the 
constraint  lines. 


with  a  Gaussian  filter.  The  image  is  sampled  in  space  and  time  to  produce  an  image 
sequence  Eftj^fJ.  The  d  and  a  intrinsic  image  arrays  d(xity)  and  afx^yj  are  computed 
from  the  image  sequence  using  the  formulas  in  Equations  3.20  and  3.21.  The  partial 
derivatives  are  computed  by  the  differences  over  a  cube  in  space  and  time  by  equations 
given  by  Equation  3.10.  The  image  flow  estimation  problem  is  to  compute  the  intrinsic 
image  arrays  pfx^yj  and  Pfx^yJ  for  the  speed  and  direction  of  motion  from  the  motion 
measurements  in  the  d  and  a  arrays.  The  constraint  line  clustering  algorithm  uses  a 
cluster  analysis  in  one  dimension  to  extract  the  motion  estimate  from  contradictory  data. 
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Suppose  that  for  each  d  and  a  measurement,  a  set  of  measurements  {  dit  a,  }  is  taken 
from  some  spatial  neighborhood  of  the  point  that  generated  d  and  a.  Compute  the  set  of 
intersections  of  each  of  the  neighboring  measurements  d,  and  a,  with  the  given  d  and  a 
measurement.  All  of  these  intersections  lie  along  the  line  defined  by  d  and  a.  Any 
constraint  lines  in  {  d„  a,  }  that  are  part  of  the  same  region  of  motion  as  d  and  a  will 
tend  to  intersect  the  line  defined  by  d  and  a  in  a  tight  cluster  around  the  true  velocity. 
Any  constraint  lines  that  are  from  different  regions  of  different  motion  will  intersect  the 
line  defined  by  d  and  a  over  a  broad  range  of  positions.  The  incorrect  constraint  lines 
generated  along  a  motion  boundary  will  not  intersect  the  line  defined  by  d  and  a  at  a 
consistent  point. 

The  formula  for  the  position  of  intersection  along  a  constraint  line  can  be 
easily  derived  by  first  rotating  the  coordinate  system  so  that  the  constraint  line  defined 
by  d  and  a  is  parallel  to  the  vertical  axis  in  velocity  space.  Let  the  angle  between  the  two 
constraint  lines  be  denoted  by  <j>  =  a  '-a  as  shown  in  Figure  3.4.  The  distance  of  the 
intersection  along  the  constraint  line  may  be  computed  by, 

b  =  — — .  (3.25) 

tan(4>) 

The  distance  c  is  related  to  d,  d'  and  <f>  by, 

(  d  +  c  )  cos(4>)  =  d'.  (3-26) 

Using  Equation  3.25  to  eliminate  c  from  Equation  3.26  yields, 


33 


d  cos(4>)  +  b  sin  (<J>)  =  d'.  (3-27) 

Solving  for  b  and  substituting  using  the  definition  for  <t>  yields, 

,        d'-d  cos((|))       d'-d  cos(ol'-ol)  /o  28^ 

sin((J))  sinCa'-a) 

This  formula  provides  the  position  along  a  constraint  line  defined  by  d  and  a  of  the 

intersection  of  the  constraint  line  with  another  constraint  line  defined  by  d'  and  a'.  So 

given  a  set  {  b,  }  of  intersections  of  the  constraint  lines  within  a  neighborhood  with  a 

constraint  line  at  the  center  of  the  neighborhood,  the  set  of  intersections  may  be  analyzed 

to  determine  the  most  consistent  subset  of  intersections  that  cluster  about  the  likely 

velocity.  The  cluster  analysis  criterion  is  motivated  and  explained  by  imagining  a  motion 

boundary  passing  almost  vertically  through  a  neighborhood  just  to  the  left  of  a  center 

element.  At  any  reasonable  pixel  resolution,  the  boundary  will  most  likely  pass  smoothly 

through  the  neighborhood  dividing  the  neighborhood  almost  in  half.  The  region  on  one 

side  corresponds  to  one  surface  in  the  scene  and  the  other  side  including  the  center 

element  corresponds  to  a  different  surface.  In  the  extreme  case  where  the  boundary 

passes  very  close  to  the  center  of  the  neighborhood,  almost  half  of  the  intersections  of 

neighborhood  constraint  lines  with  the  constraint  line  from  the  center  of  the  neighborhood 

will  be  with  constraints  from  the  other  side  of  the  surface  or  the  motion  boundary  itself. 

These  intersections  will  most  likely  not  be  close  to  the  velocity  of  the  surface  from  which 

the  center  constraint  line  was  obtained.  Almost  half  of  the  intersections  may  be  useless 

in  the  most  extreme  case;  but  at  least  half  of  the  intersections  will  correspond  to  the  same 
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region  of  motion.  Since  rejecting  correct  intersections  is  less  harmful  than  accepting 
incorrect  intersections,  a  conservative  algorithm  can  be  taken:  the  algorithm  looks  for  the 
tightest  cluster  that  contains  roughly  half  of  the  intersections.  The  cluster  analysis  is  done 
in  one-dimension  which  makes  it  easy.  The  algorithm  sorts  the  set  of  n  intersections  {bj 
and  then  looks  for  the  tightest  interval  that  contains  half  of  the  intersections  by  examining 
successive  pair  of  intersections  that  are  [n/2]  intersections  apart.  The  algorithm  chooses 
the  pair  of  intersections  that  are  closest  together  as  the  estimate  of  the  majority  cluster 
of  intersections.  Then  the  center  position  V  of  the  tightest  cluster  along  the  constraint 
line  is  the  midpoint  of  the  smallest  interval  that  contains  roughly  half  of  the  intersections. 
So  the  estimated  speed  of  motion  is  given  by 

and  the  estimated  direction  of  motion  is  given  by, 

P  =  a  +  tan"1^  /  d).  <3-30) 

The  estimated  speed  and  direction  of  motion  (v,  0)  computed  with  constraint  line 
clustering  avoids  the  error  problem  in  motion  boundaries  and  gives  another  solution  to 
the  image  flow  constraint  equation.  Also  the  constraint  line  clustering  algorithm  is  very 
robust  and  works  even  if  the  basic  assumption  that  at  least  half  of  the  intersections  must 
be  from  the  same  region  of  motion  is  violated.  For  example  if  the  center  constraint  is 
from  just  inside  a  right  angle  corner  so  that  only  1/4  of  the  intersections  are  from  the 
same  region  of  motion,  the  algorithm  still  works  because  the  intersections  from 
constraints  outside  the  corner  are  almost  uniformly  distributed  along  the  constraint  line 
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and  do  not  bias  the  estimate  of  the  tightest  cluster. 

The  constraint  line  intersections  are  not  always  numerically  well  conditioned. 
When  two  constraint  lines  are  close  in  orientation  there  can  be  a  large  error  in  the 
intersection  position.  When  the  constraint  lines  have  the  same  orientation  and  different 
displacements,  the  lines  will  not  intersect.  In  spite  of  these  objections,  the  constraint  line 
clustering  algorithm  is  robust  by  the  fact  that  only  the  tightest  group  of  points  are 
retained  to  compute  the  motion  estimate.  Nearly  half  of  the  intersection  positions  will  be 
discarded.  Any  grossly  incorrect  intersections  will  most  probably  be  discarded  since  they 
lie  outside  the  tightest  cluster. 

The  algorithm  is  also  very  insensitive  to  the  neighborhood  size.  Good  results 
may  be  obtained  even  for  the  minimum  neighborhood  size  of  3  by  3.  Larger 
neighborhood  sizes  produce  only  slightly  better  motion  estimates.  There  are  two  reasons 
for  this.  The  first  reason  is  that  the  accuracy  of  the  motion  estimate  is  limited  by  the 
accuracy  of  the  constraint  line  along  which  the  motion  estimate  is  obtained.  The  second 
reason  is  that  the  number  of  points  in  the  neighborhood  may  have  little  impact  on  the 
quality  of  the  motion  estimate;  it  is  the  spatial  extent  of  the  neighborhood  relative  to  the 
gradient  variation  that  determines  the  quality  of  the  motion  estimate.  In  order  to  get  good 
intersection  measures,  motion  constraints  must  be  used  from  beyond  the  region  of  similar 
gradient.  The  motion  estimate  can  be  greatly  improved  by  increasing  the  spatial  coverage 
of  the  neighborhood.  One  apparent  weakness  with  the  constraint  line  clustering  is  that  it 
depends  on  the  accuracy  of  the  constraint  line  at  the  center  of  the  neighborhood.  This  is 
not  a  problem  with  the  constraint  line  clustering  algorithm  itself  but  is  an  instance  of  a 
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more  fundamental  problem  with  assigning  a  motion  estimate  to  a  point  in  an  image.  If 
the  constraint  equation  is  grossly  incorrect,  then  there  is  a  fundamental  ambiguity  in 
assigning  a  motion  estimate  to  the  corresponding  image  location. 

The  information  in  image  flow  constraint  equations  could  be  combined  by 
least  squares  methods.  The  image  flow  constraint  equation  is  one  constraint  with  two 
unknowns.  The  equations  in  a  neighborhood  could  be  solved  by  least  squares  methods 
for  an  initial  motion  estimate. 
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IV.   ONI^DIMENSIONAL  FOURIER  TRANSFORM  FOR  TRACKING  MOVING 
OBJECTS 

A.  INTRODUCTION 

The  method  presented  in  this  chapter  brings  a  solution  to  the  problem  of 
determining  motion  estimates  via  a  One-Dimensional  Fourier  transform  formulation.  The 
technique  identifies  objects  uniquely  by  their  motion  using  an  algorithm  which  is  based 
on  the  application  of  the  one-dimensional  Fourier  transform  [Ref.  8:pp.  383-388]. 

This  method  not  only  separates  the  time-varying  and  time  invariant  parts  of  an 
image  but  perform  these  operations  with  the  least  amount  of  computation.  The  method 
presented  here  associates  sinusoids  with  the  time-varying  parts  of  the  image,  such  that 
the  faster  moving  objects  are  associated  with  high  frequency  sinusoids  and  the  time- 
invariant  parts  are  associated  with  constant  levels. 

B.  THE  COSINE-AREA  TRANSFORM 

Let  a  one-dimensional  discrete  sequence  be  described  by  f(x,t),  t=0,l,...,T-l  of  T 
frames  of  size  M.  Define  a  general  transformation  as, 

A/-1 

8(0  =  £  Ax,t)  Mx\k)         r=o,i,...,r-i,  (4.1) 

where  w(x  \  k)is  a  weighing  function  of  order  k.  A  properly  chosen  weighing  function 
will  allow  us  to  use  this  weighted  area,  g(t),  to  find  the  velocity  of  a  moving  object  while 
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discriminating  against  a  stationary  background. 

If  w(x  |  k)   =  cos(2ickx/M)  is  chosen  as  a  weighting  function,  Equation  4.1 
becomes, 


Ml 

g{t)  =  £  MO  cosQnkx/M)  t=0,lt...tT-l.  (4-2) 

So  if  the  signal  f(x,t)  is  multiplied  point  for  point  with  the  amplitude  of  a  sinusoid  and 
the  result  is  summed  (integrated)  over  all  the  spatial  variables  (0,M)>  the  result  is  a  single 
value  g(tj  at  each  point  t,.  When  the  signal  is  constant  with  respect  to  time,  g(t)  will  have 
a  constant  value,  i.e.  g(t)  =  A.  When  the  time  sequence  contains  a  moving  object,  g(t) 
will  trace  out  a  sinusoid  [Ref.  9:pp.  281-283]. 

Suppose  that  for  frame  one  (t=0)  the  sequence  yields  a  one-dimensional  array  with 
M  entries  that  are  zero  except  at  one  location  xm  (where  the  object  is).  So  the  weighted 
sum  gives  g(0)=cos(2-KkxJM).  The  movement  of  the  object  by  one  location  in  the  next 
frame  (t=  1)  gives  g(l)=cos(2Trk(xm+l)/M).  If  the  object  continues  to  move  one  location 
per  frame,  g(t)  yields  a  sinusoid  with  frequency  of  k/M.  If  the  object  were  moving  v, 
locations  between  frames  then  the  sinusoid  would  have  a  frequency  vJc/M.  Since  t  varies 
between  0  and  T-l  in  integer  increments,  if  we  restrict  k  to  have  integer  values  then  the 
discrete  Fourier  transform  of  the  sinusoid  would  have  2  peaks,  one  located  at  frequency 
vjc  and  the  other  at  T-vJc.  This  latter  peak  is  due  to  the  symmetry  foldover  of  the  Fourier 
transform.  Thus  a  peak  search  in  the  Fourier  spectrum  would  yield  vjc  and  division  of 
this  quantity  by  k  yields  vx 

The  factor  k  in  the  weighting  function  has  several  properties  and  limitations. 
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Fundamentally, 

0  <  |*|  <  Ml  2,  (4-3) 

which  is  basically  determined  by  Nyquist  sampling  criteria  to  prevent  aliasing  and  for 
convenience  k  is  usually  a  positive  integer.  A  zero  k  value  allows  no  sinusoid  to  be 
created  by  a  moving  object  and  a  k  greater  than  M/2  produces  an  aliased  spectral 
response.  The  actual  value  of  k  is  chosen  from  the  frame  rate  and  the  maximum  expected 
object  velocity, 

where  f^  is  the  frequency  limitation  given  by  the  available  number  of  frames  and  the 
Nyquist  criteria.  The  maximum  and  minimum  velocities  represent  the  dynamic  range  of 
the  system,  i.e., 

v^  /  v^  =  #  frames  /  2.  (4.5) 

The  Fourier  transform  of  g(t)  provides  the  basis  for  an  efficient  and  effective 
method  for  estimating  the  angular  frequencies  of  sinusoids.  The  Fourier  transform  of  g(t) 
is  given  by, 

r-i 

G(J)  =  £  g(t)  e-^IT        /=o,i,...,r-i 

(4.6) 
r-i    mi 

=  E    E  ^x'f)  cos(2ic*c/Af)  e'W7. 
The  result  is  that  the  moving  object  generates  a  frequency  proportional  to  its  velocity. 
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All  the  stationary  objects  and  the  background  will  appear  as  a  zero  frequency  component 
in  this  transform  domain.  The  frequencies  generated  by  the  moving  objects  can  be  easily 
discerned  by  a  simple  peak  detection  algorithm. 

This  method  is  easily  generalized  for  use  on  two-dimensional  spatial  images.  Let 
the  time  sequence  of  images  be  represented  by  f(x,y,t)  with  0  <  x  <  N,  0  <  y  <  N, 
0  <  t  <  N.  The  resulting  velocity  is  a  vector  v  =  [vxvyf.  The  cosine  area  transform 
is  defined  by, 


N-l 

£  XW)  cos 

y=0 


'inkx/rf 


[2nky/N) 


-J2*ft/T 


(4.7) 


and  to  extract  velocity  components,  the  following  Fourier  transformations  are  taken, 


E 

f=0 


8x(t) 
gfl)) 


e  -J2^7. 


(4.8) 


C.      THE  GENERALIZED  AREA-TRANSFORM 

We  will  generalize  the  cosine  area  transform  in  a  way  that  will  yield  us  the 
extraction  of  both  velocity  components  simultaneously  and  identification  of  the  moving 
objects  in  the  original  sequence.  Two  Fourier  transformations  are  the  foundation  for  this 
generalization.  These  transformations  result  from  a  replacement  of  the  cos(2irkx/N)  term 
with  exp(-j2irkx/N).  The  first  transform  is  a  space  to  inverse-space  transform  whose 
values  are  subsequently  summed  over  the  second  spatial  variable;  while  the  second  is  a 
time  to  frequency  transform  [Ref.  9:p.  284].  The  transformations  for  a  sequence  of  T 
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digital  images  of  size  N x  N,  at  any  integer  instant  of  time  are  given  by, 


N-l  am 


-flukjc. 


^M  ■  E  E  >fcw)«   N      ks=o,±u... 

x=0   y=0 


N-l  N-l  -/2*kj 

Fik^t)  ='£y£Ax^)e     N        *y=0,±l,... 

y=0  x=Q 


(4.9) 


(4.10) 


G(M  =  EF(M*  T     /=o,i,...,r-i  (4-n) 


/=0 


r-i  -fl-*ft 

G%f)  =  "£  F(kyit)  e    T         /=0,l,-.,r-l  (4-12) 

It  should  be  noted  that  separate  transforms  are  used  for  each  spatial  dimension,  and  the 
first  two  transforms  are  summed  over  both  spatial  variables.  This  is  done  so  that  the 
velocity  components  can  be  resolved  for  each  dimension.  The  velocity  ku  product  is  found 
by  detecting  a  peak  in  the  spectrum  of  G(kHJ)  for  a  fixed  value  of  ku.  The  velocity  of  an 
object  in  the  dimension  u  is  then  found  by  dividing  the  frequency  of  the  peak  by  kH.  Also 
it  is  of  interest  to  note  that  a  sequence  of  frames  in  which  no  motion  takes  place  would 
yield  identical  exponential  terms  whose  Fourier  transform  would  consist  of  a  single  peak 
at  a  frequency  of  0.  Since  the  operations  are  linear,  the  general  case  involving  one  or 
more  moving  objects  in  an  arbitrary  static  background  would  have  a  Fourier  transform 
with  a  peak  at  0  (corresponding  to  the  static  image  components)  and  peaks  at  locations 
proportional  to  the  velocities  of  the  objects. 
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Figure  4.1  The  ideal  power  spectrum  (G(k„,f))  with  DFT. 

The  determination  of  a  velocity  component  requires  the  use  of  phase  information 
derived  from  the  real  and  imaginary  parts  of  F(k„J).  Since  the  speed  of  object  was 
directly  obtained  from  the  spectrum  of  G(kuJ),  the  phase  information  in  F(ku,t)  is  used 
to  derive  the  sign  of  the  velocity.  Once  a  moving  object  is  found,  comparing  the  signs 
of  the  equations  below  determines  the  sign  of  the  velocity; 


C      - 

d2Re{F(kuyt)} 

01 

dt2 

c 

<Plm{F(kujf)} 

D2 

dt2 

(4.13) 


t=t. 


(4.14) 


*=<. 


Since  F(kH,t)  is  a  sinusoidal,  S,  and  S2  will  have  the  same  sign  at  an  arbitrary  point  in 
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time  t,  if  the  velocity  has  a  positive  sign.  The  velocity  will  be  negative  if  St  and  S2  have 
opposite  signs.  It  should  be  noted  that  the  ambiguity  resulting  when  S,  or  S2  equals  zero 
is  avoided  by  using  the  closest  point  in  time  t  =  t,  ±  At. 

The  algorithm  proposes  that  the  1-D  FFT  (DVT)  spectrum  of  F(ku,t)  provided  by 
Equation  4.11,  4.12  will  have  a  peak  proportional  to  the  velocity  components  of  the 
moving  object  as  shown  in  Figure  4.1.  In  reality  the  algorithm  yields  a  wide  spectrum. 
This  spectrum  has  "smeared"  frequency  peaks  instead  of  one  frequency  peak  at  the 
desired  location  as  shown  in  Figure  4.2. 
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Figure  4.2  The  "smeared"  power  spectrum  (G^f))  with  DFT. 


There  are  mainly  2  reasons  for  this  undesirable  result. 

(i)  The  undesirable  signal  components  in  the  image  frames  (noise,  occluding  boundary 

effects,  rotation  of  object,  non-stationary  background  etc.).  These  "noise  effects"  can 
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cause  spectral  spreading. 

(ii)  The  loss  of  frequency  resolution  due  to  limited  number  of  frames. 

The  first  condition  has  already  been  mentioned  where  the  model  did  not  include  the 
non-stationary  and  non-uniform  background.  The  effect  of  noise  is  always  a  problem  for 
any  algorithm. 

The  second  condition  is  caused  by  the  situations  where  the  equation  VmJVKA  —  # 
frames  12  is  not  satisfied  due  to  the  lack  of  available  frames.  Note  that  if  V^  is  taken  to 
be  1  pixel  per  unit  time,  V^  is  limited  by  the  number  of  frames  available.  This 
limitation  may  be  overcome  by  assigning  the  appropriate  value  to  ku.  For  the  case  where 
image  size  is  an  integer  multiple  of  the  available  number  of  frames,  the  maximum  peak 
occurs  at  the  frequencies  integer  divident  of  actual  velocity.  For  other  cases,  the  single 
frequency  component  at  kju  will  be  "smeared"  out  to  other  frequencies.  This  smearing 
effect  is  defined  as  "leakage"  [Ref.  10:pp.  459-474]  and  may  cause  errors  in  velocity 
estimates  which  also  add  to  the  errors  due  to  the  noise  effects.  Also  as  the  leakage  causes 
significant  nonzero  frequency  components  at  the  undesired  frequencies  (starting  from 
adjacent  frequencies),  it  also  reduces  the  magnitude  of  the  peak  at  the  desired  (or 
accurate)  frequency. 

A  precise  analysis  is  beyond  the  scope  of  the  discussion  here,  but  an  intuitive  idea 
of  what  is  causing  leakage  may  be  obtained  by  looking  at  the  time  domain  representation 
of  sinusoids.  The  primary  source  of  leakage  seen  in  the  FFT  (DFT)  of  these  sinusoids 
are  the  discontinuity  introduced  in  the  periodic  extensions  of  complex  sinusoid  sequences 
by  the  missing  partial  cycles  in  available  data.  Missing  partial  cycles  of  the  periodic 
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sinusoids  occurs  when  the  image  size  is  not  an  integer  multiple  of  the  available  number 
of  frames.  Note  that  the  best  result  occur  in  cases  where  the  number  of  frames  is  equal 
to  the  image  size. 

One  solution  to  the  problems  introduced  above  is  to  use  Spectral  Estimation 
techniques  as  a  last  step  to  extract  the  frequency  information  from  the  available  corrupted 
data.  Besides  the  1-D  FFT  (DFT),  the  Maximum  Entropy  Method  can  be  used  to 
improve  estimates  of  peak  frequencies.  The  Maximum  Entropy  Method  (MEM)  was 
originally  devised  by  Burg  [Ref.  11]  to  overcome  the  fundemental  limitations  of 
Fourier-based  methods  for  estimating  the  power  spectrum  of  complex  sinusoids  in  the 
presence  of  additive  noise. 
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Figure  4.3  The  improved  power  spectrum  with  the  MEM  method. 
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Other  spectral  estimation  techniques  such  as  the  Multiple  Signal  Characterization 
(MUSIC)  Algorithm,  the  Maximum  Likelihood  Method  (MEM)  and  the  Auto  Regressive 
(AR)  Signal  Modelling  were  considered.  The  MEM  was  selected  since  it  has  significantly 
better  resolution  characteristics  than  the  others  when  not  much  data  is  available  [Ref. 
12:pp.  385-392].  In  particular,  the  MEM  avoids  the  use  of  periodic  extension  of  data  or 
of  the  assumption  that  the  data  outside  the  available  record  length  is  zero. 
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V.    THE  3-D  FOURIER  DOMAIN  ANALYSIS  OF  IMAGE  MOTION 

A.      THE  SPATIOTEMPORAL  FREQUENCY  METHOD  (STF) 

This  approach  to  image  flow  derivation  takes  the  advantage  of  the  special  form  of 
the  Fourier  transform  of  time  varying  images  characterizing  constant- velocity,  rigid-body 
translation.  The  method  is  based  on  the  spatiotemporal-frequency  image  representation 
and  is  called  STF  (Spatiotemporal-Frequency  Approach)  [Ref.  13,14].  A  time-varying 
image  function,  moving  in  a  spatial  direction  determined  by  the  velocity  vector  [u  vf 
may  be  defined  as, 

Ax,y,t)  =fix-  ut,y  -  vt), 

=  ftx,y)  *  6(x  -  ut,y  -  vr), 

where  f(x,y)  =  f(x,y,0),  b(x,y)  is  the  Dirac  delta  function  and  "*"  denotes  convolution. 
The  2-D  Fourier  transform  of  Equation  5.1  incorporates  a  phase  term  of  the  form, 

F2-D  i8(x,y,t)}  =  e*"*V*V*  F2_D{g(W)|r=0}, 

(5.2) 


-j2n(wji+wyv)t 


=  e 


F(wx,wJ, 


where  Ffw^wJ  is  the  fourier  transform  of f(x,y).  The  existence  of  this  phase  component 
in  the  above  equation  enables  the  estimation  of  image  velocity.  This  phase  term  is 
linearly  dependent  on  t  and  the  inner  product  of  the  2-D  Fourier  indices  (wx  wJT  with 
the  velocity  vector  (u  v)T.  Determining  the  3-D  Fourier  transform  of  Equation  5.1  only 


48 


(5.3) 


requires  an  extension  of  the  2-D  transform  in  Equation  5.2.  Again  assuming  constant 
velocity,  the  3-D  Fourier  transform  can  be  denoted  by, 

FCw^WyW)  =  F3_D{f(x,y,f)}, 

OS 

=  /  F2_D[fajA)  e'i2™f  dt, 

—OB 

m 

— ,  N  t       -y2K(w_«+w_vy      -jliiwf    j. 

=  FlWfWj  J  e        *   y "  e       *  dt, 

—OP 

=  F(wx,wJ  5(wJtu+w>v+wr). 
As  shown  in  Appendix  B,  this  gives  a  very  interesting  result  that  a  uniformly  translating 
image  with  velocity  [u^vf  has  a  Fourier  spectrum  that  is  zero  elsewhere  in  the  3-D 
spatiotemporal  space  (wz,wy,wj  except  on  the  single  plane  defined  by, 

{    OW^r)   :   UoWx    +  VoWy   +  Wt   =  °   )•  ^54^ 

In  general,  for  each  velocity  [u0,vJT  Equation  5.4  defines  a  unique  planar  locus  of 
spatiotemporal  frequencies  which  are  consistent  with  that  velocity  as  shown  in  Figure 
5.1.  Thus  F(wx,wy,wJ  may  be  visualized  as  a  3-D  solid  of  constant  intensities,  where  the 
slope  of  this  solid  surface  is  given  by  the  values  of  u0  and  Vq.  This  scheme  for  flow 
derivation  however,  does  not  provide  the  generation  of  an  optical  flow  function;  rather 
it  yields  a  single  velocity  characterizing  the  entire  image  for  all  time.  To  make  the 
computation  of  a  time-varying  and  local  image  flow  function  possible,  numerous 
regionally  computed  3-D  Fourier  transforms  could  be  employed  instead  of  just  one  global 
transform.  Then  a  regional  velocity  could  be  determined  for  each  of  the  regional  Fourier 
transforms  in  the  manner  described  above. 
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Figure  5.1   The  velocity  planes  in  the  STF   (spatiotemporal-frequency)  Space. 

The  3-D  Fourier  domain  analysis  of  image  motion  is  useful  for  several  reasons,  in 
particular: 

1)  It   suggests   a   method   for  determining   motion   (velocity)   parameters   by 
examination  of  the  nonzero  region  of  F(wJtwy,wJ. 

2)  It  allows  the  design  of  a  frequency  domain  filter  for  spatiotemporal  interpolation 
of  images  [Ref.  4: pp.  61-66]. 
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B.      VELOCITY  TUNED  FILTERS 

The  form  of  velocity-tuned  filters  is  motivated  by  the  frequency-domain  description 
of  images  which  represent  constant- velocity,  rigid-body  translation  of  objects  [Ref.  15]. 
Although  this  is  a  very  restrictive  assumption,  the  results  form  a  good  foundation  for 
considering  more  general  types  of  motion.  Motivated  by  the  form  of  F(wx,wy,wJ,  which 
defines  a  plane  in  3-dimensional  frequency  space,  a  velocity-tuned  filter  is  defined  as  one 
which  passes  the  Fourier  transform  of  a  constant-velocity  function  with  unity  gain  and 
has  zero  gain  over  the  rest  of  the  frequency  space.  The  filter  will  be  supported  on  a 
unique  planar  surface  defined  by  the  velocity  components  of  the  image.  However,  such 
a  filter  turns  out  to  have  limited  usefulness  and  is  physically  unrealizable.  A  better  and 
more  general  approach  is  to  use  a  finite  bandwidth  filter,  one  which  has  approximately 
unity  gain  in  a  solid  3-dimensional  region  surrounding  the  planar  surface  of  F(wx,wy,wJ 
and  goes  to  zero  outside  that  region.  So,  we  may  consider  the  support  of  the 
3-dimensional  filter  to  be  bounded  by  two  planes  which  are  parallel  to  the  central  plane 
and  separated  in  the  w,  direction  by  a  temporal  bandwidth  of  Fg(wJ.  The  velocity-tuned 
filter  will  also  pass  constant-velocity  functions  whose  velocity  is  different  than  the 
velocity  to  which  the  filter  is  tuned;  this  requires  only  that  the  filter  transfer  function 
H(wx,wy,wJ  is  approximately  constant  over  the  support  of  the  input.  The  range  of  velocity 
over  which  this  is  true  gets  smaller  as  the  temporal  bandwidth  of  Fg(wJ  decreases  and  as 
the  spatial  bandwidth  of  the  input  increases.  Specifically  the  velocity  range  can  be 
defined  as, 
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I  „/  _  „*  I  <  _L         and         |  v/  -  v«  |  <  — L-  (5-5) 

Wx  Wy 

Here  i/,V  are  the  velocity  values  to  which  the  filter  is  tuned  and  U,V  are  the  velocity 
values  of  the  input  [Ref.  15:p.  64],  B,  is  the  temporal  bandwidth  of  Fg(wJ,  and  Wx,Wy  are 
the  spatial  bandwiths  of  the  original  function.  The  quantities  BJWZ  and  B/Wy  collectively 
represent  the  effective  velocity  bandwidth  of  the  filter  when  applied  to  this  input.  When 
the  conditions  of  Equation  5.5  are  not  met,  the  Fourier  transform  of  the  input  intersects 
the  planar  boundaries  of  the  bandpass  filter,  and  the  frequency  components  outside  the 
passband  are  attenuated. 
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VI.    SIMULATION  RESULTS  AND  CONCLUSIONS 

The  algorithms  developed  in  the  previous  chapters  are  simulated  with  artificial  and 
real  image  frames  where  MATLAB  is  used  as  a  tool  for  simulation.  The  artificial  image 
frames  were  32x32  pixels  in  size  and  contained  a  square  to  represent  the  moving  object. 
Sixteen  shades  of  gray  levels  are  used  to  represent  the  pixel  intensity  values  of  the  square 
object  and  the  background,  although  the  intensity  values  are  normalized  before  applying 
each  algorithm.  The  algorithms  are  tested  under  two  different  background  conditions.  For 
the  first  case  the  image  frames  are  assumed  to  have  uniform  background  with  zero 
intensity  and  for  the  second  case  a  random  checkerboard  background  with  half  the 
intensity  of  the  moving  object  is  used.  A  random  checkerboard  background  is  used  in 
order  to  avoid  any  regular  or  periodic  pattern  in  the  background.  The  artificial  images 
used  in  the  simulations  are  shown  in  Figures  6.1-6.4. 

Throughout  the  simulations,  an  artificial  disturbance  (noise)  is  added  to  the  image 
frames  (pixel  values).  The  Relative  Error  in  the  estimation  of  velocity  components  is 
used  as  a  comparison  criteria  for  different  SNR  and  different  algorithms.  The  SNR  (in 
decibels,  dB)  for  an  NxN  image  is  defined  as, 


SNR  =  10Log 


10 


/  N-l  N-l 

E  E^.y)2 

x=0  y=0 

N-l  N-l 

E  E  "(*.>-)2 

^  x=0  y=0 


(6.1) 
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where  f(x,y)  is  the  desired  component  (actual  pixel  value),  and  n(x,y)  is  the  undesired 
component  (noise).  The  final  pixel  values  of  the  image  frames  are  determined  by  the  sum 
of  these  two  terms.  The  Relative  Error  is  defined  as, 

Relative  Error  = -  (6.2) 

M2 

where  u  is  the  actual  value  of  velocity  component,  u  is  the  estimated  velocity  component 

and  ||  •  ||  2  defines  the  Frobenius  norm  of  a  vector  or  matrix.  Each  algorithm  is  tested 

4000-8000  times  with  very  slowly  changing  SNR's.  The  resulting  values  of  these  tests 

are  averaged  over  larger  changes  of  SNR's.  In  each  test,  the  algorithms  are  iterated  32 

times.  Since  the  image  size  was  32,  the  moving  object  is  simulated  such  that  as  it 

disappeared  from  one  end  of  the  image  frame  it  is  allowed  to  reappear  at  the  other  end. 

These  algorithms  are  also  tested  with  a  real  image  sequence  containing  a  moving 

car.  This  real  image  was  128x128  pixels  in  size  and  contained  256  shades  of  gray  to 

represent  the  pixel  values.  These  tests  are  done  just  to  see  if  the  algorithms  would  work 

with  real  images.  Also  since  the  EKF  and  the  3-D  FFT  algorithm  (with  Velocity  Tuned 

Filter)  promises  restoration  of  the  image  and  increase  in  SNR's,  these  two  algorithms  are 

tested  with  the  real  image  sequences. 
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Figure  6.1  The  test  image  frame  containing  a  square  object  without  noise. 


Figure  6.2  The  test  image  frame  containing  a  square  object  with  noise. 
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Figure  6.3  The  test  image  frame  including  background  without  noise. 


Figure  6.4  The  test  image  frame  including  background  with  noise. 
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A.      THE  EXTENDED  KALMAN  FILTER 

The  extended  Kalman  filter  is  implemented  with  a  3rd  order  filter  for  images  with 
zero  (or  uniform)  background  images  and  with  a  5th  order  filter  for  images  with  nonzero 
background  as  was  discussed  in  Chapter  II. 

Two  sample  test  results  of  the  EKF  algorithm  are  shown  in  Figures  6.5-6.6.  The 
EKF  gives  good  results  even  under  fairly  low  SNR  conditions.  In  general  at  high  SNR 
situations  the  EKF  converges  to  the  actual  velocity  values  after  3-4  iterations  and  as  the 
SNR  decreases  the  iterations  needed  increases  up  to  25-30  iterations.  At  very  low  SNR 
conditions  (-30,-40  dB),  the  EKF  estimate  of  the  velocity  components  is  not  guaranteed 
and  depends  on  the  random  disturbance.  Figure  6.7  shows  the  Average  Relative  Error 
(computed  by  averaging  the  relative  error  over  32  iterations)  and  the  Final  Relative  Error 
(the  relative  error  in  the  final  velocity  estimate)  versus  the  SNR  of  the  input  image 
frames.  Figure  6.8  shows  the  Relative  Error  versus  Iterations  for  some  selected  values 
of  SNR.  Figures  6.9-6.10  shows  similar  results  where  the  image  frames  included  the 
background  (5*  order  EKF).  As  seen  from  these  two  sets  of  results  the  background 
effected  the  performance  of  the  algorithm.  There  are  mainly  two  reasons  for  that.  First, 
the  occluding  boundaries  create  a  noise  term  which  is  not  included  in  the  SNR  term. 
Second,  the  background  which  has  much  more  signal  power  than  the  moving  object, 
makes  it  harder  to  detect  the  phase  changes  due  to  the  moving  object.  Similar  effects  are 
seen  in  other  algorithms  that  use  Fourier  phase  changes  as  a  basis  for  velocity  estimation. 
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THE  EXTENDED  KALMAN  FILTER 
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Figure  6.5  The  EKF  velocity  estimates  for  a  high  SNR  case. 
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Figure  6.6  The  EKF  velocity  estimates  for  a  low  SNR  case. 
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Figure  6.7  The  Relative  Error  vs.  SNR  with  a  3rd  order  EKF. 
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Figure  6.8  The  Relative  Error  vs.  Iterations  for  a  3rd  order  EKF. 
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Figure  6.9  The  Relative  Error  vs.  SNR  for  a  5th  order  EKF. 
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B.      THE  IMAGE  FLOW  CONSTRAINT  EQUATION  METHODS 

The  iterative  solution  given  by  Equations  3.16  -  3.17  and  Constraint  Line 
Clustering  method  are  used  for  the  computation  of  the  Image  Flow  Constraint  equation 
in  the  simulations.  The  iterative  solution  method  is  implemented  in  two  different  ways. 
For  the  first  case  only  two  frames  are  used  and  the  computations  are  iterated  for  1,  4, 
16,  and  32  times.  The  estimated  velocity  fields  are  shown  as  arrow  diagrams  in  Figure 
6.11.  Initially,  the  velocity  estimates  are  assumed  to  be  zero.  Consequently,  the  first 
iteration  shows  vectors  where  the  temporal  changes  occur.  Later  the  estimates  approach 
the  correct  values  in  most  parts  of  the  image.  A  few  changes  occur  after  16  iterations. 
In  the  next  case,  one  iteration  is  used  between  two  consecutive  frames  (per  time  step). 
The  resulting  velocity  fields  are  shown  in  Figure  6.12  for  1,  4,  16,  and  32  time  steps. 
The  estimates  approached  the  correct  values  more  rapidly  and  very  few  changes  occur 
after  16  time  steps.  This  leads  to  the  final  method  where  the  velocity  estimates  are 
computed  4  times  between  two  consecutive  frames  with  16  or  32  time  steps.  The  velocity 
fields  after  1,  4,  16,  and  32  time  steps,  where  4  iterations  per  time  step  are  used,  is 
shown  in  Figure  6.13.  The  final  estimates  of  the  velocity  components  of  the  moving 
object  are  computed  simply  by  averaging  the  velocity  field  in  pixel  locations  that  have 
significant  temporal  changes  (i.e.  where  the  moving  object  is).  This  created  a 
thresholding  problem  to  determine  the  region  of  motion  especially  for  the  high  noise 
case.  Since  both  the  artificial  image  and  the  real  image  had  higher  intensity  values  at  the 
object  points  than  the  background,  the  problem  is  solved  rather  easily.  But  as  the  noise 
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level  is  increased  to  higher  values,  the  error  in  computing  the  regions  of  motion 
increased  the  error  in  the  velocity  estimates.  Since  the  new  estimates  at  a  particular  point 
do  not  directly  depend  on  the  previous  estimates  at  the  same  point,  increasing  the  time 
steps  does  not  effect  (or  improve)  the  final  velocity  estimates.  Although  it  improves  the 
estimate  of  the  velocity  field. 

Constraint  Line  Clustering  methods  do  not  use  a  previous  velocity  estimate  at  all. 
Consequently  it  is  computed  between  two  consecutive  frames.  Again  the  final  velocity 
is  estimated  by  averaging  the  velocity  field  in  the  region  of  motion.  Figure  6.14  shows 
the  velocity  field  computed  by  the  Constraint  Line  Clustering  algorithm. 

Both  algorithms  show  good  performance  when  the  noise  levels  are  low  even  if  the 
image  has  a  finite  number  of  discontinuities  (or  when  the  image  is  not  smooth  over  all). 
However,  the  performance  decreases  quickly  with  increasing  noise  levels.  Therefore,  the 
image  frames  are  smoothed  with  a  Gaussian  lowpass  filter  implemented  in  the  frequency 
domain  to  improve  the  velocity  field  estimates.  The  Relative  Error  vs.  SNR  for  these 
algorithms  are  shown  in  Figures  6.15-6.16  for  image  frames  with  2  different  choices  of 
backgrounds  respectively. 
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Figure  6.11  The  velocity  field  after  1,  4,  16,  and  32  iterations. 
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Figure  6.12  The  velocity  field  after  1,  4,  16,  and  32  time  steps. 
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Figure  6.14  The  velocity  field  with  Constraint  Line  Clustering. 
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Figure  6.15  The  Relative  Error  vs.  SNR  for  Image  Flow  Constraint  Eq.  solutions. 
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C.      THE  SPATIOTEMPORAL  FREQUENCY  (3-D  FFT)  ALGORITHM 

This  algorithm  requires  a  large  memory  space  since  the  previous  image  frames  (or 
their  2-dimensional  FFT's)  need  to  be  saved  in  order  to  perform  the  FFT  in  the  third 
dimension.  Also,  the  definition  and  detection  of  the  plane  in  the  3-dimensional 
spatiotemporal  frequency  space  is  another  problem  to  be  solved.  In  the  simulations  the 
planes  are  defined  by  their  slopes  in  the  wx,  and  wy  directions.  These  slopes  are  extracted 
by  detecting  the  slope  of  the  lines  created  by  the  intersection  of  the  defined  plane  with 
the  vertical  planes  normal  to  the  wx,  and  wy  axes.  The  slopes  are  determined  as  integer 
values  which  caused  the  final  velocity  estimates  to  be  integer  values  as  well.  This  is  an 
acceptable  situation  since  the  actual  velocity  values  are  accepted  as  integer  values  too. 
The  velocity  estimates  are  affected  by  the  number  of  frames  available.  Although  it  was 
not  mentioned  before,  the  effect  of  smearing  (or  leakage)  is  seen  around  the  central 
plane  because  of  missing  image  frames.  Besides  the  expected  nonzero  central  plane, the 
3-D  FFT  creates  other  planes  which  are  parallel  to  this  central  plane.  These  planes  are 
also  supported  by  the  noise  terms.  One  test  of  velocity  estimation  with  the  3-D  FFT 
algorithm  is  shown  in  Figure  6. 17.  Figures  6. 18-6. 19  show  the  Relative  Error  values  vs. 
SNR  and  number  of  iterations  for  this  algorithm. 

The  3-D  FFT  algorithm  can  be  used  for  images  with  nonzero  background  as  well. 
In  this  case,  the  algorithm  creates  a  second  nonzero  plane  with  the  slope  of  zero  due  to 
the  constant  background.  The  actual  plane  can  be  determined  by  deleting  the  plane 
associated  with  the  constant  background  first.    The  smearing  effect  is  also  seen  around 


68 


the  plane  associated  with  the  constant  background.  This  makes  the  detection  of  the 
desired  plane  harder  and  creates  errors  in  the  velocity  estimates.  The  simulation  results 
of  the  Spatiotemporal  Frequency  algorithm  with  image  sequences  including  background 
are  shown  in  Figures  6.20-6.21. 
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Figure  6.17  Velocity  estimates  with  the  Spatiotemporal  Frequency  algorithm. 
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Figure  6.18  The  Relative  Error  vs.  SNR  for  3-D  FFT  algorithm. 
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Figure  6.20  The  Relative  Error  vs.  SNR  for  3-D  FFT  Algorithm  (with  background). 
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Figure  6.21  The  relative  Error  vs  Iterations  for  3-D  FFT  algorithm  (with  background). 
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D.      THE  1-DIMENSIONAL  FFT  ALGORITHM 

This  algorithm  is  the  simplest,  as  mentioned  before,  but  still  gives  fairly  good 
velocity  estimates.  It  takes  much  less  computation  time  since  it  uses  only  one-dimensional 
Fourier  transforms.  Throughout  the  simulations,  the  estimates  of  this  algorithm  are  also 
improved  in  two  steps.  First,  the  velocity  components  are  found  by  averaging  the  results 
for  3  k„  values  (k„  =  1,2,3).  Second,  the  velocity  components  are  computed  by  Spectral 
Estimation  (MEM)  methods  as  discussed  in  Chapter  IV  for  k„=l.  The  final  velocity  is 
estimated  by  comparing  the  MEM  estimate  with  the  average  estimates.  A  sample  test 
result  is  shown  in  Figure  6.22.  For  high  SNR,  even  two  frames  are  enough  to  yield  a 
solution.  But  as  the  noise  level  increases  more  frames  are  needed.  Overall  this  algorithm 
gives  better  results  for  the  estimation  of  motion  parameters  than  the  other  algorithms. 
The  Relative  Error  vs.  SNR  and  Iterations  are  plotted  in  Figures  6.23-6.26  for  two 
choices  of  background. 


72 


1  -D  FFT  ALGORITHM 


5 


15  20 

ITERATIONS 


25 


30 


Figure  6.22  Velocity  estimates  with  1-D  FFT  algorithm. 
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Figure  6.23  The  Relative  Error  vs.  SNR  for  1-D  FFT  algorithm. 
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Figure  6.24  The  Relative  Error  vs.  Iterations  for  1-D  FFT  algorithm. 
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Figure  6.25  The  Relative  Error  vs.  SNR  for  1-D  FFT  algorithm  (with  background). 
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Figure   6.26  The  Relative  Error   vs.    Iterations   for   1-D  FFT  algorithm    (with 
background). 
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E.      RESTORATION  AND  IMAGE  ENHANCEMENT 

The  test  image  frames  are  restored  by  using  the  estimates  of  the  EKF  and  by  using 
the  Velocity  Tuned  Filter  (VTF)  as  outlined  in  Chapter  V.  The  artificial  images  with 
decreasing  SNR  are  used  in  the  simulations.  After  each  simulation,  the  SNR  of  the 
restored  (or  output)  image  frames  are  computed  and  compared  with  the  SNR  of  the  noisy 
(or  input)  image.  The  difference  between  these  SNR  values  are  plotted  in  Figures  6.30- 
6.31  to  show  the  increase  (or  decrease)  in  the  image  quality.  As  seen  from  these  plots 
the  EKF  does  not  improve  the  image  quality  for  images  with  SNR's  higher  than  10  dB. 
This  occurs  since  some  of  the  higher  spatial  frequencies  are  truncated  in  the  EKF 
algorithm.  Similiarly,  the  results  show  an  increase  in  image  quality  for  images  with  SNR 
terms  lower  than  -20  dB.  However  when  the  restored  images  are  checked,  it  is  seen  that 
the  increase  in  SNR  is  not  a  real  increase  in  image  quality,  instead,  it  is  an  increase 
related  to  lowpass  filtering  of  the  image  caused  by  truncating  frequencies.  The  same 
effect  is  seen  for  VTF  after  -25  dB  SNR's.  Figures  6.26-6.29  show  the  original,  noisy, 
and  the  restrored  image  frames  of  a  sample  run  of  these  algorithms. 
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Figure  6.26  The  original  image  frame  without  noise. 


Figure  6.27  The  original  image  with  additive  noise  (SNR  =  -15  dB). 
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Figure  6.28  The  restored  image  with  EKF  (SNR  =  -10  dB). 


Figure  6.29  The  restored  image  vith  VTF  (SNR  =  -7.5  dB). 
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F.      CONCLUSIONS 

As  seen  from  the  simulation  results,  the  1-D  FFT  algorithm  gives  the  best  result 
for  the  estimation  of  velocity  components,  although  this  algorithm  does  not  preserve  the 
original  image.  The  EKF  gives  the  second  best  result  for  the  estimation  of  velocity 
components  for  low  SNR  images  and  also  increases  the  image  quality.  The  3-D  FFT 
(STF)  algorithm  does  not  perform  as  well  as  the  others  but  promises  an  increase  in  the 
image  quality  with  Velocity  Tuned  Filters  (VTF).  The  algorithms  that  implement  the 
Image  Flow  Constraint  equation  performed  after  smoothing.  Additionally,  these 
algorithms  produce  a  velocity  field  besides  only  the  estimation  of  a  single 
two-dimensional  velocity  component. 
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Appendix  A.  Derivation  of  Image  Flow  Constraint  Equation 

Let  E(x,y,t)  be  the  irradiance  at  time  t  at  the  image  point  (x,y).  Then  if  the  image 
point  is  displaced  a  distance  5x  in  the  x-direction  and  dy  in  the  y-direction  in  time  5t,  we 
expect  that  the  irradiance  will  be  the  same  at  time  t+St  at  the  point  (x+5x,y+Sy).  So, 

E(x,y,t)  =  E(x+bx,y+by,t+bf).  (A1) 

The  right  hand  side  of  Equation  A.  1  can  be  expanded  about  a  point  (x,y,t)  using  Taylor 
series  and  so  obtain, 

E(x,y,t)  *  -p5x  +  -^6y  +  ^§bt  +  e  =  E(x,y,t),  (A.2) 

ox  dy  dt 

where  e  contains  second  and  higher  order  terms  in  5x,5y  and  St.  After  substracting 
E(x,y,t)  from  both  sides  and  dividing  through  by  5t,  the  result  is, 

dEbx  +  dEby  +  dE  =  (A  3) 

dx  bt        dy  bt        dt 

where  co(5t)  is  a  term  of  order  dt  that  includes  second  and  higher  variations  of  x  and  y. 
In  the  limit  as  5t  -»  0,  the  above  equation  becomes, 

Ex  u  +  Ev  v  +  E  =  0.  (A.4) 

This  equation  is  called  the  image  flow  constraint  equation  as  mentioned  before. 
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Appendix  B.  Fourier  Transform  of  a  Linearly  Translating  image 
Let  f(x,y,t)  represent  the  time  varying  image  function  that  contains  a  uniformly 
translating  object  as  given  by  Equation  3.1, 


Ax,y,t)  =Ax-vj,y-vyf) 

=Ax,y)  *  ?>(x-vxt,y-vyt) 


(B.l) 


The  3-dimensional  Fourier  transform  of f(x,y,t)  is  given  by, 


FK,w,,«-,)  =  /  /  /  toV*'  * ""'  "w')  dx  dy  dt.  (B.2) 


Using  the  representation  given  in  Equation  B.  1  gives, 


and 


Fiw^w^w^fff^)*b(pc-v^-vfi]e^^^dxdyA,  (B.3) 


F(wxtWyW)= ]  e'*"' 


//[My)  *  6(x-vxt,y-vyt)]e-j^^dxdy L        ^4) 


F(wxiwy,w)=fe'jtw'  hwx,wy)f{b(x-vxt,y-vyt)  e'*™' +  **  dx  dy\  dt.    (B'5> 


Performing  the  integrals  inside  the  brackets  gives, 


F K,w>>w,)  =  fe  -""■  {  F(wz,Wy)  e  *«  *  V*  }  *,  (B.6) 
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and, 

F{wx,wy,w)  ^(w^jfe-^Wt'dt.  (B.7) 

This  yields  the  final  result  as, 

^(vVvyvv,)  =  F(wx,wy)  *>(vxwx+vywy+wt).  (B.8) 
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